QuartoQuarto enables you to weave together content and
executable code into a finished document. To learn more about Quarto see
https://quarto.org.
boys dataThe boys data from the mice
package (Van Buuren and Groothuis-Oudshoorn
2011) in R (R Core Team 2023) is a random sample of 10%
from the cross-sectional data used to construct the Dutch growth
references 1997. Variables gen and phb are
ordered factors. reg is a factor.
The following table shows the first 6 rows of the boys
data.
head(mice::boys)
## age hgt wgt bmi hc gen phb tv reg
## 3 0.035 50.1 3.650 14.54 33.7 <NA> <NA> NA south
## 4 0.038 53.5 3.370 11.77 35.0 <NA> <NA> NA south
## 18 0.057 50.0 3.140 12.56 35.2 <NA> <NA> NA south
## 23 0.060 54.5 4.270 14.37 36.7 <NA> <NA> NA south
## 28 0.062 57.5 5.030 15.21 37.3 <NA> <NA> NA south
## 36 0.068 55.5 4.655 15.11 37.0 <NA> <NA> NA south
boys data.# Read data
boys <- read.csv("../data/data.csv")[, -1]
head(boys)
## age hgt wgt bmi hc gen phb tv reg
## 1 0.035 50.1 3.650 14.54 33.7 <NA> <NA> NA south
## 2 0.038 53.5 3.370 11.77 35.0 <NA> <NA> NA south
## 3 0.057 50.0 3.140 12.56 35.2 <NA> <NA> NA south
## 4 0.060 54.5 4.270 14.37 36.7 <NA> <NA> NA south
## 5 0.062 57.5 5.030 15.21 37.3 <NA> <NA> NA south
## 6 0.068 55.5 4.655 15.11 37.0 <NA> <NA> NA south
boys set is incompleteNot every value in the mice::boys set is observed. This
may pose problems with the analysis of the boys data. To
get an idea about the problem, we can use missing data patterns. Hanne
Oberman (2023) created the ggmice package to create a ggplot2 (Wickham
2016) type plot of the missing values in the boys
data.
library(mice)
##
## Attaching package: 'mice'
## The following object is masked _by_ '.GlobalEnv':
##
## boys
## The following object is masked from 'package:stats':
##
## filter
## The following objects are masked from 'package:base':
##
## cbind, rbind
library(ggmice)
##
## Attaching package: 'ggmice'
## The following objects are masked from 'package:mice':
##
## bwplot, densityplot, stripplot, xyplot
library(magrittr)
# visualize ggplot2-like missing data pattern
mice::boys |>
ggmice::plot_pattern()
boys dataThe boys data contains 748 rows and 9 columns. In total
there are 1622 missing values in the boys data, with the
highest number of missing values in the tv column.